In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recursion (or tail-end recursion) is particularly useful, and is often easy to optimize in implementations.
Tail calls can be implemented without adding a new stack frame to the call stack. Most of the frame of the current procedure is no longer needed, and can be replaced by the frame of the tail call, modified as appropriate (similar to overlay for processes, but for function calls). The program can then jump to the called subroutine. Producing such code instead of a standard call sequence is called tail-call elimination or tail-call optimization. Tail-call elimination allows procedure calls in tail position to be implemented as efficiently as goto statements, thus allowing efficient structured programming. In the words of Guy L. Steele, "in general, procedure calls may be usefully thought of as GOTO statements which also pass parameters, and can be uniformly coded as machine JUMP instructions."
Not all programming languages require tail-call elimination. However, in functional programming languages, tail-call elimination is often guaranteed by the language standard, allowing tail recursion to use a similar amount of memory as an equivalent loop. The special case of tail-recursive calls, when a function calls itself, may be more amenable to call elimination than general tail calls. When the language semantics do not explicitly support general tail calls, a compiler can often still optimize sibling calls, or tail calls to functions which take and return the same types as the caller.
For non-recursive function calls, this is usually an optimization that saves only a little time and space, since there are not that many different functions available to call. When dealing with recursive or mutually recursive functions where recursion happens through tail calls, however, the stack space and the number of returns saved can grow to be very significant, since a function can call itself, directly or indirectly, creating a new call stack frame each time. Tail-call elimination often reduces asymptotic stack space requirements from linear, or Big-O notation(n), to constant, or Big-O notation(1). Tail-call elimination is thus required by the standard definitions of some programming languages, such as Scheme, and languages in the ML family among others. The Scheme language definition formalizes the intuitive notion of tail position exactly, by specifying which syntactic forms allow having results in tail context. Implementations allowing an unlimited number of tail calls to be active at the same moment, thanks to tail-call elimination, can also be called 'properly tail recursive'.
Besides space and execution efficiency, tail-call elimination is important in the functional programming idiom known as continuation-passing style (CPS), which would otherwise quickly run out of stack space.
a(data);
return b(data);
}
Here, both a(data) and b(data) are calls, but b is the last thing the procedure executes before returning and is thus in tail position. However, not all tail calls are necessarily located at the syntactical end of a subroutine:
if (a(data)) {
return b(data);
}
return c(data);
}
Here, both calls to b and c are in tail position. This is because each of them lies in the end of if-branch respectively, even though the first one is not syntactically at the end of bar's body.
In this code:
return a(data) + 1;
}
var ret = a(data);
return ret;
}
var ret = a(data);
return (ret == 0) ? 1 : ret;
}
the call to a(data) is in tail position in foo2, but it is not in tail position either in foo1 or in foo3, because control must return to the caller to allow it to inspect or modify the return value before returning it.
(define (factorial n)
(if (= n 0)
1
(* n (factorial (- n 1)))))
This is not written in a tail-recursive style, because the multiplication function ("*") is in the tail position. This can be compared to:
(define (factorial n)
(fact-iter 1 n))
(define (fact-iter product n)
(if (= n 0)
product
(fact-iter (* product n)
(- n 1))))
This program assumes applicative-order evaluation. The inner procedure fact-iter calls itself last in the control flow. This allows an interpreter or compiler to reorganize the execution which would ordinarily look like this:
call factorial (4) call fact-iter (1 4) call fact-iter (4 3) call fact-iter (12 2) call fact-iter (24 1) return 24 return 24 return 24 return 24 return 24
into the more efficient variant, in terms of both space and time:
call factorial (4) call fact-iter (1 4) replace arguments with (4 3) replace arguments with (12 2) replace arguments with (24 1) return 24 return 24
This reorganization saves space because no state except for the calling function's address needs to be saved, either on the stack or on the heap, and the call stack frame for fact-iter is reused for the intermediate results storage. This also means that the programmer need not worry about running out of stack or heap space for extremely deep recursions. In typical implementations, the tail-recursive variant will be substantially faster than the other variant, but only by a constant factor.
Some programmers working in functional languages will rewrite recursive code to be tail recursive so they can take advantage of this feature. This often requires addition of an "accumulator" argument (product in the above example) to the function.
Rest, Bigs) :-
X @< Pivot, !, partition(Xs, Pivot, Rest, Bigs).partition(X | Rest) :-
partition(Xs, Pivot, Smalls, Rest). | x < p = (x:a,b) | otherwise = (a,x:b)
where
(a,b) = partition xs p
| ||||
Xs,
( X @< Pivot -> partition(Xs,Pivot,Rest,Bigs), Smalls=[X | Rest] ; partition(Xs,Pivot,Smalls,Rest), Bigs=[X | Rest] ). | Xs,
( X @< Pivot -> Smalls=[X | Rest], partition(Xs,Pivot,Rest,Bigs) ; Bigs=[X | Rest], partition(Xs,Pivot,Smalls,Rest)
).
|
Thus in tail-recursive translation such a call is transformed into first creating a new list node and setting its first field, and then making the tail call with the pointer to the node's rest field as argument, to be filled recursively. The same effect is achieved when the recursion is guarded under a lazily evaluated data constructor, which is automatically achieved in lazy programming languages like Haskell.
void *value; struct list *next;} list; list *duplicate(const list *ls) { list *head = NULL;
if (ls != NULL) { list *p = duplicate(ls->next); head = malloc(sizeof *head); head->value = ls->value; head->next = p; } return head;}
(if (not (null? ls)) (cons (car ls) (duplicate (cdr ls))) '())) | ||
Xs,R):-
duplicate(Xs,Ys), R=[X | Ys]. duplicate(,). |
In this form the function is not tail recursive, because control returns to the caller after the recursive call duplicates the rest of the input list. Even if it were to allocate the head node before duplicating the rest, it would still need to plug in the result of the recursive call into the next field after the call. So the function is almost tail recursive. Warren's method pushes the responsibility of filling the next field into the recursive call itself, which thus becomes tail call. Using sentinel head node to simplify the code,
void *value; struct list *next;} list; void duplicate_aux(const list *ls, list *end); list *duplicate(const list *ls) { list head;
duplicate_aux(ls, &head); return head.next;} void duplicate_aux(const list *ls, list *end) { if (ls != NULL) { end->next = malloc(sizeof *end); end->next->value = ls->value; duplicate_aux(ls->next, end->next); } else { end->next = NULL; }}
(let ((head (list 1))) (let dup ((ls ls) (end head)) (cond ((not (null? ls)) (set-cdr! end (list (car ls))) (dup (cdr ls) (cdr end))))) (cdr head))) | ||
Xs,R):-
R=[X | Ys], duplicate(Xs,Ys). duplicate(,). |
The callee now appends to the end of the growing list, rather than have the caller prepend to the beginning of the returned list. The work is now done on the way forward from the list's start, before the recursive call which then proceeds further, instead of backward from the list's end, after the recursive call has returned its result. It is thus similar to the accumulating parameter technique, turning a recursive computation into an iterative one.
Characteristically for this technique, a parent call frame is created on the execution call stack, which the tail-recursive callee can reuse as its own call frame if the tail-call optimization is present.
The tail-recursive implementation can now be converted into an explicitly iterative implementation, as an accumulating loop:
void *value; struct list *next;} list; list *duplicate(const list *ls) { list head, *end; end = &head; while (ls != NULL) { end->next = malloc(sizeof *end); end->next->value = ls->value; ls = ls->next; end = end->next; } end->next = NULL; return head.next;} ;; in Scheme, (define (duplicate ls) (let ((head (list 1))) (do ((end head (cdr end)) (ls ls (cdr ls ))) ((null? ls) (cdr head)) (set-cdr! end (list (car ls)))))) |
However, for language implementations which store function arguments and local variables on a call stack (which is the default implementation for many languages, at least on systems with a hardware stack, such as the x86), implementing generalized tail-call optimization (including mutual tail recursion) presents an issue: if the size of the callee's activation record is different from that of the caller, then additional cleanup or resizing of the stack frame may be required. For these cases, optimizing tail recursion remains trivial, but general tail-call optimization may be harder to implement efficiently.
For example, in the Java virtual machine (JVM), tail-recursive calls can be eliminated (as this reuses the existing call stack), but general tail calls cannot be (as this changes the call stack)." What is difference between tail calls and tail recursion?", Stack Overflow" What limitations does the JVM impose on tail-call optimization", Programmers Stack Exchange As a result, functional languages such as Scala that target the JVM can efficiently implement direct tail recursion, but not mutual tail recursion.
The GCC, Clang, and Intel compiler suites perform tail-call optimization for C and other languages at higher optimization levels or when the -foptimize-sibling-calls option is passed. Though the given language syntax may not explicitly support it, the compiler can make this optimization whenever it can determine that the return types for the caller and callee are equivalent, and that the argument types passed to both function are either the same, or require the same amount of total storage space on the call stack.
Various implementation methods are available.
For compilers generating assembly directly, tail-call elimination is easy: it suffices to replace a call opcode with a jump one, after fixing parameters on the stack. From a compiler's perspective, the first example above is initially translated into pseudo-assembly language (in fact, this is valid x86 assembly):
foo:
call B
call A
ret
Tail-call elimination replaces the last two lines with a single jump instruction:
foo:
call B
jmp A
After subroutine A completes, it will then return directly to the return address of foo, omitting the unnecessary ret statement.
Typically, the subroutines being called need to be supplied with parameters. The generated code thus needs to make sure that the call frame for A is properly set up before jumping to the tail-called subroutine. For instance, on platforms where the call stack does not just contain the return statement, but also the parameters for the subroutine, the compiler may need to emit instructions to adjust the call stack. On such a platform, for the code:
'''function''' foo(data1, data2) B(data1) '''return''' A(data2)
(where data1 and data2 are parameters) a compiler might translate that as:
foo:
mov reg,[sp+data1] ; fetch data1 from stack (sp) parameter into a scratch register.
push reg ; put data1 on stack where B expects it
call B ; B uses data1
pop ; remove data1 from stack
mov reg,[sp+data2] ; fetch data2 from stack (sp) parameter into a scratch register.
push reg ; put data2 on stack where A expects it
call A ; A uses data2
pop ; remove data2 from stack.
ret
A tail-call optimizer could then change the code to:
foo:
mov reg,[sp+data1] ; fetch data1 from stack (sp) parameter into a scratch register.
push reg ; put data1 on stack where B expects it
call B ; B uses data1
pop ; remove data1 from stack
mov reg,[sp+data2] ; fetch data2 from stack (sp) parameter into a scratch register.
mov [sp+data1],reg ; put data2 where A expects it
jmp A ; A uses data2 and returns immediately to caller.
This code is more efficient both in terms of execution speed and use of stack space.
It is possible to implement trampolines using higher-order functions in languages that support them, such as Groovy, Visual Basic .NET and C#.Samuel Jack, Bouncing on your tail. Functional Fun. April 9, 2008.
Using a trampoline for all function calls is rather more expensive than the normal C function call, so at least one Scheme compiler, Chicken, uses a technique first described by Henry Baker from an unpublished suggestion by Andrew Appel,Henry Baker, "CONS Should Not CONS Its Arguments, Part II: Cheney on the M.T.A." in which normal C calls are used but the stack size is checked before every call. When the stack reaches its maximum permitted size, objects on the stack are garbage-collected using the Cheney algorithm by moving all live data into a separate heap. Following this, the stack is unwound ("popped") and the program resumes from the state saved just before the garbage collection. Baker says "Appel's method avoids making a large number of small trampoline bounces by occasionally jumping off the Empire State Building." The garbage collection ensures that mutual tail recursion can continue indefinitely. However, this approach requires that no C function call ever returns, since there is no guarantee that its caller's stack frame still exists; therefore, it involves a much more dramatic internal rewriting of the program code: continuation-passing style.
'''procedure''' foo(''x'') '''if''' ''p''(''x'') '''return''' bar(''x'') '''else''' '''return''' foo(baz(''x''))
into
'''procedure''' foo(''x'') '''while''' '''true''' '''if''' ''p''(''x'') '''return''' bar(''x'') '''else''' ''x'' ← baz(''x'')
where x may be a tuple involving more than one variable: if so, care must be taken in implementing the assignment statement x ← baz( x) so that dependencies are respected. One may need to introduce auxiliary variables or use a swap construct.
More generally,
'''procedure''' foo(''x'') '''if''' ''p''(''x'') '''return''' bar(''x'') '''else if''' ''q''(''x'') '''return''' baz(''x'') ... '''else if''' ''r''(''x'') '''return''' foo(qux(''x'')) ... '''else''' '''return''' foo(quux(''x''))
can be transformed into
'''procedure''' foo(''x'') '''while''' '''true''' '''if''' ''p''(''x'') '''return''' bar(''x'') '''else if''' ''q''(''x'') '''return''' baz(''x'') ... '''else if''' ''r''(''x'') ''x'' ← qux(''x'') ... '''else''' ''x'' ← quux(''x'')
For instance, this Julia program gives a non-tail recursive definition call of the factorial:
if n == 0
return 1
else
return n * factorial(n - 1)
end
end
Indeed, ret wraps the call to fact. But it can be transformed into a tail-recursive definition by adding an argument n * factorial(n - 1) called an accumulator.
This Julia program gives a tail-recursive definition factorial of the factorial:
function factorial(n::Integer)
if n == 0:
return a
else
return factorial(n - 1, n * a)
end
end
return factorial(n, 1)
end
This Julia program gives an iterative definition a of the factorial:
function factorial(n::Integer)
while n > 0
a = n * a
n = n - 1
end
return a
end
return fact_iter(n, one(n))
end
|
|